NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Spatial Clustering of Citizen Science Data Improves Downstream Species Distribution Models

https://doi.org/10.1609/aaai.v39i27.34993

Ahmed, Nahian; Roth, Mark; Hallman, Tyler A; Robinson, W Douglas; Hutchinson, Rebecca A (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Citizen science biodiversity data present great opportunities for ecology and conservation across vast spatial and temporal scales. However, the opportunistic nature of these data lacks the sampling structure required by modeling methodologies that address a pervasive challenge in ecological data collection: imperfect detection, i.e., the likelihood of under-observing species on field surveys. Occupancy modeling is an example of an approach that accounts for imperfect detection by explicitly modeling the observation process separately from the biological process of habitat selection. This produces species distribution models that speak to the pattern of the species on a landscape after accounting for imperfect detection in the data, rather than the pattern of species observations corrupted by errors. To achieve this benefit, occupancy models require multiple surveys of a site across which the site's status (i.e., occupied or not) is assumed constant. Since citizen science data are not collected under the required repeated-visit protocol, observations may be grouped into sites post hoc. Existing approaches for constructing sites discard some observations and/or consider only geographic distance and not environmental similarity. In this study, we compare ten approaches for site construction in terms of their impact on downstream species distribution models for 31 bird species in Oregon, using observations recorded in the eBird database. We find that occupancy models built on sites constructed by spatial clustering algorithms perform better than existing alternatives.
more » « less
Free, publicly-accessible full text available April 11, 2026
Under-Counted Matrix Completion Without Detection Features

https://doi.org/10.1109/ICASSP49660.2025.10888717

Nguyen, Tri; Ibrahim, Shahana; Hutchinson, Rebecca A; Fu, Xiao (April 2025, IEEE)

Free, publicly-accessible full text available April 6, 2026
Model Evaluation for Geospatial Problems

Wang, Jing; Hallman, Tyler A; Hopkins, Laurel M; Kilbride, John B; Robinson, W Douglas; Hutchinson, Rebecca A (December 2023, CompSust-2023 2023 NeurIPS Workshop on Computational Sustainability: Pitfalls and Promises from Theory to Deployment)

Geospatial problems often involve spatial autocorrelation and covariate shift, which violate the independent, identically distributed assumption underlying standard cross-validation. In this work, we establish a theoretical criterion for unbiased crossvalidation, introduce a preliminary categorization framework to guide practitioners in choosing suitable cross-validation strategies for geospatial problems, reconcile conflicting recommendations on best practices, and develop a novel, straightforward method with both theoretical guarantees and empirical success.
more » « less
Full Text Available
Cross-validation for Geospatial Data: Estimating Generalization Performance in Geostatistical Problems

Wang, Jing; Hopkins, Laurel; Hallman, Tyler A; Robinson, W Douglas; Hutchinson, Rebecca A (October 2023, Transactions on Machine Learning Research)

Geostatistical learning problems are frequently characterized by spatial autocorrelation in the input features and/or the potential for covariate shift at test time. These realities violate the classical assumption of independent, identically distributed data, upon which most cross-validation algorithms rely in order to estimate the generalization performance of a model. In this paper, we present a theoretical criterion for unbiased cross-validation estimators in the geospatial setting. We also introduce a new cross-validation algorithm to evaluate models, inspired by the challenges of geospatial problems. We apply a framework for categorizing problems into different types of geospatial scenarios to help practitioners select an appropriate cross-validation strategy. Our empirical analyses compare cross-validation algorithms on both simulated and several real datasets to develop recommendations for a variety of geospatial settings. This paper aims to draw attention to some challenges that arise in model evaluation for geospatial problems and to provide guidance for users.
more » « less
Full Text Available
A comparison of remotely sensed environmental predictors for avian distributions

https://doi.org/10.1007/s10980-022-01406-y

Hopkins, Laurel M.; Hallman, Tyler A.; Kilbride, John; Robinson, W. Douglas; Hutchinson, Rebecca A. (April 2022, Landscape Ecology)

Full Text Available
Link Prediction Under Imperfect Detection: Collaborative Filtering for Ecological Networks

https://doi.org/10.1109/TKDE.2019.2962031

Fu, Xiao; Seo, Eugene; Clarke, Justin; Hutchinson, Rebecca A. (August 2021, IEEE Transactions on Knowledge and Data Engineering)
null (Ed.)
Full Text Available
On the Role of Spatial Clustering Algorithms in Building Species Distribution Models from Community Science Data

Roth, Mark; Hallman, Tyler; Robinson, W. Douglas; Hutchinson, Rebecca A. (July 2021, ICML 2021 Workshop: Tackling Climate Change with Machine Learning)

This paper discusses opportunities for developments in spatial clustering methods to help leverage broad scale community science data for building species distribution models (SDMs). SDMs are tools that inform the science and policy needed to mitigate the impacts of climate change on biodiversity. Community science data span spatial and temporal scales unachievable by expert surveys alone, but they lack the structure imposed in smaller scale studies to allow adjustments for observational biases. Spatial clustering approaches can construct the necessary structure after surveys have occurred, but more work is needed to ensure that they are effective for this purpose. In this proposal, we describe the role of spatial clustering for realizing the potential of large biodiversity datasets, how existing methods approach this problem, and ideas for future work.
more » « less
Full Text Available
Benchmark Bird Surveys Help Quantify Counting Accuracy in a Citizen-Science Database

https://doi.org/10.3389/fevo.2021.568278

Robinson, W. Douglas; Hallman, Tyler A.; Hutchinson, Rebecca A. (February 2021, Frontiers in ecology and evolution)
null (Ed.)
The growth of biodiversity data sets generated by citizen scientists continues to accelerate. The availability of such data has greatly expanded the scale of questions researchers can address. Yet, error, bias, and noise continue to be serious concerns for analysts, particularly when data being contributed to these giant online data sets are difficult to verify. Counts of birds contributed to eBird, the world’s largest biodiversity online database, present a potentially useful resource for tracking trends over time and space in species’ abundances. We quantified counting accuracy in a sample of 1,406 eBird checklists by comparing numbers contributed by birders (N = 246) who visited a popular birding location in Oregon, USA, with numbers generated by a professional ornithologist engaged in a long-term study creating benchmark (reference) measurements of daily bird counts. We focused on waterbirds, which are easily visible at this site. We evaluated potential predictors of count differences, including characteristics of contributed checklists, of each species, and of time of day and year. Count differences were biased toward undercounts, with more than 75% of counts being below the daily benchmark value. Median count discrepancies were −29.1% (range: 0 to −42.8%; N = 20 species). Model sets revealed an important influence of each species’ reference count, which varied seasonally as waterbird numbers fluctuated, and of percent of species known to be present each day that were included on each checklist. That is, checklists indicating a more thorough survey of the species richness at the site also had, on average, smaller count differences. However, even on checklists with the most thorough species lists, counts were biased low and exceptionally variable in their accuracy. To improve utility of such bird count data, we suggest three strategies to pursue in the future. (1) Assess additional options for analytically determining how to select checklists that include less biased count data, as well as exploring options for correcting bias during the analysis stage. (2) Add options for users to provide additional information that helps analysts choose checklists, such as an option for users to tag checklists where they focused on obtaining accurate counts. (3) Explore opportunities to effectively calibrate citizen-science bird count data by establishing a formalized network of marquis sites where dedicated observers regularly contribute carefully collected benchmark data.
more » « less
Full Text Available
Climate Change and Local Host Availability Drive the Northern Range Boundary in the Rapid Expansion of a Specialist Insect Herbivore, Papilio cresphontes

https://doi.org/10.3389/fevo.2021.579230

Wilson, J. Keaton; Casajus, Nicolas; Hutchinson, Rebecca A.; McFarland, Kent P.; Kerr, Jeremy T.; Berteaux, Dominique; Larrivée, Maxim; Prudic, Kathleen L. (March 2021, Frontiers in Ecology and Evolution)
null (Ed.)
Species distributions, abundance, and interactions have always been influenced by human activity and are currently experiencing rapid change. Biodiversity benchmark surveys traditionally require intense human labor inputs to find, identify, and record organisms limiting the rate and impact of scientific enquiry and discovery. Recent emergence and advancement of monitoring technologies have improved biodiversity data collection to a scale and scope previously unimaginable. Community science web platforms, smartphone applications, and technology assisted identification have expedited the speed and enhanced the volume of observational data all while providing open access to these data worldwide. How to integrate and leverage the data into valuable information on how species are changing in space and time requires new best practices in computational and analytical approaches. Here we integrate data from three community science repositories to explore how a specialist herbivore distribution changes in relation to host plant distributions and other environmental factors. We generate a series of temporally explicit species distribution models to generate range predictions for a specialist insect herbivore ( Papilio cresphontes ) and three predominant host-plant species. We find that this insect species has experienced rapid northern range expansion, likely due to a combination of the range of its larval host plants and climate changes in winter. This case study shows rapid data collection through large scale community science endeavors can be leveraged through thoughtful data integration and transparent analytic pipelines to inform how environmental change impacts where species are and their interactions for a more cost effective method of biodiversity benchmarking.
more » « less
Full Text Available
Global COVID-19 lockdown highlights humans as both threats and custodians of the environment

https://doi.org/10.1016/j.biocon.2021.109175

Bates, Amanda E.; Primack, Richard B.; Biggar, Brandy S.; Bird, Tomas J.; Clinton, Mary E.; Command, Rylan J.; Richards, Cerren; Shellard, Marc; Geraldi, Nathan R.; Vergara, Valeria; et al (November 2021, Biological Conservation)

Full Text Available

Search for: All records